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Abstract 

The  goal  is  to  compute  eigenvectors  of  a  symmetric  tridiagonal 
matrix  T  that  are  orthogonal  to  working  accuracy.  Consider  a  cluster 
of  m  very  close  eigenvalues  that  are  reasonably  well  separated  from 
the  remaining  spectrum.  We  show  here  that  there  are  m  principal 
submatrices  of  T  such  that  only  the  nearest  neighbors  overlap  and 
each  submatrix  has  a  simple,  isolated  eigenvalue  in  the  convex  hull 
of  the  cluster  with  eigenvector  having  small  entries  in  the  first  and 
last  positions.  This  eigenvector  is  padded  with  zero  entries,  above 
and  below,  to  make  it  conform  to  T.  The  set  of  vectors,  one  from 
each  submatrix,  forms  a  good  basis  for  the  invariant  subspace.  Each 
basis  vector  may  be  modified,  if  necessary,  by  its  nearest  neighbors  to 
produce  an  orthonormal  basis. 

The  only  communication  that  may  be  needed,  in  such  situations, 
is  between  nearest  neighbors. 

We  give  a  good  bound  on  the  dot  product  of  nearest  neighbors.  A 
variety  of  examples  illustrate  the  theory. 

The  ideas  in  this  paper  were  presented  at  the  ENUMATH  meeting  at 
CNRS,  Paris,  in  September  1995  and  at  the  ILAY  workshop  at  Cerfacs  in 
October,  1995. 
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1  Introduction 

‘Inverse  Iteration  gives  a  very  satisfactory  solution  to  the 
problem  as  far  as  reasonably  well  separated  eigenvalues  are  con¬ 
cerned.  The  problem  of  determining  reliably  full  digital  infor¬ 
mation  in  the  subspace  spanned  by  eigenvectors  corresponding 
to  coincident  or  pathologically  close  eigenvalues  has  never  been 
satisfactorily  solved.’ 

J.  H.  Wilkinson 

(from  ‘The  Algebraic  Eigenvalue  Problem’,  1965, 

Chapter  5,  p.  344.) 

This  quotation  is  over  30  years  old  and  yet  most  experts  would  agree  that 
its  claim  is  still  true  in  1995. 

Our  goal  is  to  build  up  an  orthonormal  basis  for  the  invariant  subspace 
associated  with  a  reasonably  isolated  cluster  of  very  close  eigenvalues  of  a 
symmetric  tridiagonal  matrix  T.  This  is  achieved  bj'  the  QR  algorithm  and 
it  is  only  the  relatively  high  cost  of  accumulating  all  the  plane  rotations  that 
drives  the  search  for  other  techniques.  Certain  codes  (e.  g.,  xstein)  currently 
used  in  LAPACK,  and  other  libraries  such  as  NAG  and  IMSL  and  ESSL, 
decline  in  both  efficiency  and  quality  of  output  as  eigenvalues  get  closer  to 
each  other.  The  reason  is  that  these  codes  are  based  on  inverse  iteration  and 
it  is  extremely  difficult  to  choose  automatically  suitable  right  hand  sides  to 
ensure  both  the  spanning  property  (accuracy)  and  orthogonality. 

One  way  out  of  the  difficulty  is  to  discard  an  appropriate  set  of  rows  from 
the  top  and  bottom  of  T  and  to  work  with  the  remaining  submatrix  to  obtain 
a  basis  vector.  The  well  known  test  matrix  that  is  discussed  in  Section  5 
has  its  largest  two  eigenvalues  equal  to  single  (and  double)  precision.  In  this 
easy  case  it  suffices  to  compute  the  eigenvectors  of  the  submatrices  in  rows 
1  to  19  and  3  to  21,  append  zero  entries  at  the  bottom  of  one  and  the  top  of 
the  other,  and  finally  deliver  the  internal  and  external  bisectors  of  these  two 
vectors.  Even  the  tiny  entries  are  computed  to  high  relative  accuracy. 

The  guiding  principle  behind  this  approach  is  that  an  eigenvector  associ¬ 
ated  with  a  simple,  isolated  eigenvalue  is  easy  to  compute.  So,  for  a  cluster 
of  m  close  eigenvalues,  the  task  is  to  find  m  different  submatrices  of  T  each 
of  which  has  an  isolated  eigenvalue  in  the  convex  hull  of  the  cluster.  That  is 
not  enough.  The  eigenvector  of  any  internal  submatrix  must  have  small  en- 
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tries  in  its  first  and  last  positions.  It  is  not  obvious  that  these  specifications 
can  always  be  met  and  the  goal  of  this  paper  is  to  provide  the  theory  that 
supports  our  use  of  submatrices. 

It  turns  out  that  the  case  of  close  pairs  is  fundamental.  The  general  case 
may  be  reduced  to  considering  a  number  of  pairs.  A  surprising  outcome  of 
these  investigations  is  that  the  difficulty  of  computing  an  accurate  orthogonal 
basis  is  not  properly  measured  by  the  gap  between  eigenvalues.  The  support 
of  the  vectors  (the  positions  holding  nonnegligible  values)  plays  an  important 
role.  Strictly  speaking  zero  entries  in  eigenvectors  are  extremely  rare  and, 
when  they  do  occur,  they  are  isolated.  Since  we  ignore  isolated  zero  entries 
we  might  conclude  that  the  support  of  all  the  eigenvectors  we  seek  is  the  full 
index  set.  To  take  such  a  pedantic  view  would  be  to  miss  an  important  fact 
about  many,  but  not  all,  n  x  n  tridiagonal  matrices  as  n  grows:  the  active 
part  of  the  eigenvector  is  confined  to  a  small  part  of  the  domain,  perhaps 
only  30  or  40  consecutive  positions.  The  remaining  entries  carry  a  tiny  bit 
of  noise.  Consequently  we  use  the  term  support  somewhat  informally.  If  the 
support  of  two  eigenvectors  is  disjoint  then  they  are  orthogonal  however  close 
the  associated  eigenvalues  may  be. 

Closely  related  to  the  idea  of  support  is  the  concept  of  the  overlap  of  two 
vectors  x  and  y, 

Overlap{x,y)  := 

the  cosine  between  the  vectors  of  absolute  values.  Our  two  main  results  are: 

1.  Each  normalized  vector  x  produced  from  an  appropriate  submatrix  by 
appending  zeros  has  p  :=  x*Tx  in  the  cluster  interval  and 

||(r  —  pl)x\\  =  0{clusier  length) 

2.  Any  two  normalized  vectors  x  and  y  from  overlapping  submatrices  satisfy 

,  ^  ^  ( { cluster  length\^^'^\ 

Overlap{x,y)  =  0  n - — - j  1 

where  gap  is  the  separation  of  the  cluster  from  the  remaining  spectra  of  the 
two  submatrices. 

Submatrices  only  overlap  their  nearest  neighbors.  Physicists  would  say 
that  the  overlap  matrix  is  tridiagonal  and  close  to  the  identity.  However 
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the  subspace  spanned  by  these  subvectors  is  more  accurate  than  indicated 
by  the  residual  norms  of  the  subvectors  separately.  That  is  the  content  of 
Section  9.  What  Results  1  and  2  show  is  that  the  subvectors  yield  a  sparse, 
nearly  orthogonal  basis  for  a  good  subspace. 

The  results  developed  here  are  pure  matrix  theory,  there  is  no  reference 
to  machine  precision.  They  depend  strongly  on  the  tridiagonal  from. 

The  first  result  is  somewhat  surprising  because  experience  with  the  Lan- 
czos  algorithm  had  suggested  that  the  best  that  could  be  guaranteed  for  any 
vector  X  obtained  from  a  submatrix  was  ||(r  — /z/)x||  =  0{y/cluster  length) 
and  that  is  not  strong  enough  for  our  purposes. 

Tight  clusters  of  close  eigenvalues  are  not  the  only  spectral  distributions 
to  pose  challenges  to  numerical  analysts.  Consider  eigenvalues  that  form  a 
geometric  progression.  This  case  is  not  troublesome  if  the  eigenvalues  are 
computed  to  high  relative  relative  accuracy.  A  different  problem  is  posed 
by  small  perturbations  of  the  identity  matrix  /.  How  large  must  the  per¬ 
turbation  be  before  the  user  will  not  accept  the  columns  of  /  as  suitable 
eigenvectors?  Shifting  by  I  and  scaling  by  the  largest  remaining  entry  makes 
the  perturbations  seem  important  and  the  shifted  and  scaled  eigenvalues 
seem  well  separated.  One  would  then  compute  a  set  of  eigenvectois  very 
different  from  the  columns  of  I .  It  would  seem  that  only  the  user  can  choose 
between  the  two  and  we  shall  not  pursue  this  question  here. 

We  begin  our  analysis  in  Section  6  by  introducing  the  envelope  of  a  cluster 
and  showing  how  it  guarantees  Result  1  with  a  constant  bounded  by  ^/nj2 
but  normally  much  smaller.  The  envelope  also  reveals  good  choices  for  sub¬ 
matrices.  Then  we  take  a  different  approach  to  show  how  the  submatrix 
indices  affect  the  constants  behind  the  0{clusier  length).  Our  results  make 
heavy  use  of  detailed  properties  of  tridiagonal  matrices  and  that  mateiial  is 
gathered  in  Section  3  and  Section  10. 

The  reader  is  urged  to  read  Section  2  on  notation. 


2  Notation 


oil 


?l  ^2 

CX2 

^l  02 


0n-l 

0n-l 


OIt. 


T  =  tridiag 


02 
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Eigenvalues;  Ai  <  A2  <  . .  •  <  An.  (1) 

Normalized  eigenvectors:  si,  S25  •  •  •  5  (2) 

When  /?,•  ^  0,  i  =  1, . . .  ,n  -  1  then  T  is  unreduced.  In  that  case  the  inequal¬ 
ities  in  (1)  are  strict. 

The  principal  submatrix  of  T  in  rows  j,j  +  is  denoted  by  either 

or,  simply,  by  {j  :  k).  The  eigenvalues  of  a  submatrix  {j  :  k)  are  called 
Ritz  values  or  R-values: 


<...<CVi-  (3) 

The  normalized  eigenvector  of  dj'*^  is  sf*.  The  mth  entry  is  s,  (m).  The 
natural  way  to  index  the  entries  of  s/’*'  is  from  j  to  k,  not  from  1  to  k—j  +  1. 
In  this  way  is  embedded  in  a  vector  conformable  to  T  and  we  consider 
entries  in  positions  1  :  i  —  1  and  A:  -f  1  ;  n  as  zero.  Sometimes  we  write  6i  ^  as 
To  simplify  further  we  write  for  when  lies  in  [A_,A+],  our 
cluster. 

The  characteristic  polynomial  of  or  {j  :  A:),  is  defined  by 

:=  det  {tI  - 

We  write  Xi  for  Jn  general  column  vectors  are  denoted  by  lower  case  Ro¬ 
man  letters  in  boldface  type:  u,  2, . . .,  and  individual  entries  are  v{j),  z{l), . . .. 
The  size  of  the  identity  matrix  /  is  given  by  the  context  and  its  columns  are 

e,-  =  (0, ...  0, 1, 0 ... ,  0)*,  1  in  position  i. 

3  Eigenvectors  and  Error  Estimates 

The  unreduced  n  x  n  matrix  T  is  uniquely  determined  (up  to  signs)  by 
its  spectrum  {Ai,...,A„}  and  (n  —  1)  appropriate  extra  items.  This  extra 
information  may  take  various  forms:  the  spectrum  of  submatrix  (1  :  n  —  1) 
or  (2  :  n),  or  the  squares  of  the  top  (or  bottom)  entries  of  the  normalized 
eigenvectors.  See  [4]  or  [2]. 

Consequently  there  are  numerous  expressions  for  the  eigenvectors  of  T 
and  we  give  some  of  them  here. 

Here  we  assume  that  T  is  unreduced:  =  —  1. 
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Perhaps  the  oldest  formulae  for  the  unnormalized  eigenvector  for  A  are 

(,  xlliw  ^  V  (4) 

\  ’  /?1  ^  /^1^2  ^1^2  •  •  •  ^n-l ) 

(  X^-(A)  X^="(A)  ,V 

/?n-l  ’  7 

The  catch  to  using  these  formulae  is  that  they  can  overflow  very  easily  and 
they  may  sometimes  be  extremely  sensitive  to  tiny  changes  in  A. 

There  are  attractive  formulae  for  the  magnitudes  of  the  entries  of  nor¬ 
malized  eigenvectors  ,  s„.  For  each  i  =  1, 2, . . . ,  n,  and  x  —  X^‘” 


s.-(l)s,(n)x'(A0  =  (6) 

^.•(lfx'(A.)  =  X^="(A.),  (7) 

^i(«)^x'(A.)  =  X'=”"'(A.),  (8) 

si{jfx\k)  =  (9) 


All  these  formulae  come  from  a  result  in  [4]  that 

adj  (A./  -  T)  =  SiS*  x'(A.) 

where  adj{M)  is  the  classical  adjugate  of  M. 

These  results  raise  the  question  of  how  to  find  the  correct  signs  of  the 
entries  and  that  leads  to  the  next  few  observations. 

Given  any  unreduced  f  there  is  a  unique  A  =  diag{±l,±l, . . . ,  ±1)  such 
that  T  =  AT  A  has  positive  off-diagonal  entries.  Observe  that 

Ts  =  sA  T{As)  =  (As)A. 

Consequently  it  is  possible  to  normalize  any  symmetric  T  so  that  it  becomes 
a  direct  sum  of  unreduced  tridiagonals  each  of  which  has  positive  off-diagonal 
entries.  From  now  on  we  assume  that  >  0,  i  =  I,. . .  ■,n  —  1. 

These  Ts  are  oscillation  matrices,  a  term  coined  by  Krein  and  Gantmacher 
[1].  The  number  of  sign  changes  among  consecutive  entries  of  the  eigenvector 
Si  is  n  -  i.  Recall  that  we  label  eigenvalues  so  that 


Aj  <  A2  <  .  .  .  A;i 
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For  example,  for  the  second  difference  SINE  matrix  (1,  -2, 1),  the  eigenvalues 
lie  in  —4  <  A,-  <  0  and  the  fundamental  mode  is  given  by  s„,  associated  with 
the  rightmost  eigenvalue.  We  use  right  and  left  rather  than  large  or  small  in 
order  to  be  translation  invariant.  For  some  people  it  is  unnatural  to  say  that 
—  1  is  larger  than  —2  because  they  think  of  magnitudes  when  the  word  large 
is  used. 

The  correct  signs  may  be  attached  to  magnitudes  when  certain  informa¬ 
tion  is  available.  A  warning  is  in  order  here.  Although  the  characteristic 
polynomial  is  defined  by  ~  ~  niost  people  compute  with 

T  -  tL  For  example,  the  pivots  in  Gaussian  elimination  are  usually  com¬ 
puted  from  the  recurrence 

di+i  =  (Qi+i  -  r)  -  0fldi 

and  so 

X^Hr)  =  {-iyd,...d,. 

If  r  =  A,-  (4)  may  be  rewritten  as 


However,  in  practice  r  ^  A,. 

We  turn  now  to  approximations  and  how  to  assess  them.  Let 

i|sf‘ii  =  1, 

define  an  eigenvector  of  a  submatrix.  Consider  as  an  approximate  eigen¬ 
vector  of  T  and  append  zeros  to  to  make  it  conform.  Now  drop  the 
subscript  /  and  observe  that 

(T  -  (10) 

Paige’s  Persistence  Theorem.  Write  6^^  for  6]'^ .  For  any  j,  1  <  j  <n, 
and  any  i,  1  <  i  <  j,  the  closed  interval 
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contains  at  least  one  Ritz  value  for  each  I  =  1,2,  ...,n  —  j.  Here 

Proof.  By  partitioning  T  one  sees  that 

(r'-i*' -el’> I ^  =  e,«ft 

So,  by  a  standard  theorem,  see  Chap.  4  in  [4],  there  is  a  such  that 

1^0+0  _  ^U)|  <  II  ^  j  II  ^ 

□ 

Frequently  the  bound  PjUi  is  a  severe  overestimate.  Whenever  is 
isolated  from  the  other  Ritz  values  then  one  defines 

gap{v,j,  1)  =  gap(i)  :=  min 

Gap  Theorem.  For  I  =  1, 2, . . .  ,n  —  j  there  is  a  such  that 

|«W+0  _  «W|  <  (/},u,,)ygap(v,j,l). 

See  [4,  Chap.  11]. 

This  is  a  huge  improvement  over  fjLVi  in  most  cases. 


The  Average-/?  Result 
Lemma  1  For  t  =  1,2, . . .  ,n, 


Proof.  For  each  f,  1  <  f  <  n 


s,(l)s,(n)x'(A,)  =  01  • ...-  0^_i 


x'(A.  )  =  n(Ai  -  h). 

k^i 


and 
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Since  the  geometric  mean  is  majorized  by  the  arithmetic  mean 

Ml)si(n)l<i  W(1)  +  4("))<2' 


Hence 


aiiu 

(ft ■  ft-.)'''”-"  <  <  ix'(Aor'‘“"'’. 

□ 

as  claimed.  ,  . 

Quantities  of  the  form  I3jlgap  will  be  0(1)  on  average  and  they  occur  i 

our  analysis.  Speaking  loosely  we  may  say  that  the  average  ^  is  bounded  by 

the  average  distance  of  any  eigenvalue  from  all  the  others. 

4  On  First  and  Last  Entries 

Let  A_  and  A+  be  two  adjacent  eigenvalues  of  T  well  separated  from  the 
remaining  spectrum.  Let 


From  (6)  in  Section  3, 


T  s±  —  s±A±,  ||^±1  ~  1- 


:(1)5±(U)X'(^±)  =  ^1  •  •  • 


Thus 

s+{l]s+{n)  ^  X'(^-) 

s_(l)s_(n)  X^('^+) 

(A--A+)ni(^--^^) 
■  (A^-A-)n"(A+-^i)‘ 


means 


Aj  ^  Ad 


n 

=  -1  +  (A+-A-)  A+-Aj 


=  -1  +  0 


A+-A. 
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where  gap  =  minj  |A+  —  Aj|.  So,  when  A+  —  A_  "C  gap, 


s-{n) 

s_(l) 

s+(n) 

If  |s+(l)|  >  |■s-(l)|  then  the  support  of  s+  is  concentrated  in  the  top  of  an 
n- vector  whereas  the  support  of  s_  is  concentrated  in  the  lower  section.  Thus 
the  supports  may  be  nearly  disjoint,  however  close  A_  and  A+  may  be.  When 
|s+(l)l  w  |5-(1)|  then  s+  and  s_  have  the  same  support  and  it  is  easier  to 
approximate  the  bisectors  (s+  ±  S-)fy/2. 


5  An  Example:  W21 


/  1  1  .  1  \ 

W  =  W^^-  tridiag  10  9  •  #9  10  . 

\  1  1  •  1  / 

This  matrix,  and  its  companion  Wj'j,  were  designed  by  Wilkinson,  see  Ch.  6 
in  [6],  to  illustrate  some  subtle  points  concerning  the  computation  of  eigen¬ 
vectors.  Although  the  eigenvalues  of  W  are  distinct  in  exact  arithmetic  there 
are  several  pairs  that  are  equal  to  single  precision  (e  =  1.2  x  10  ^).  However 
the  smaller  eigenvalue  pairs  are  not  so  close  and  the  only  negative  eigenvalue 
is  well  separated  from  the  rest.  Table  1  gives  selected  eigenvalues.  Note  that 
the  W^i  is  persymmetric:  invariant  under  reversal. 

‘If  we  separate  into  a  direct  sum  by  putting  =  0  then  we 
obtain  independent  orthogonal  vectors  which  span  the  subspace 
corresponding  to  the  pathologically  close  pair  of  eigenvalues  A20 
and  A21  to  a  very  high  accuracy.  It  is  therefore  quite  possible  that 
such  a  decomposition  is  always  permissible,  and  what  is  needed  is 
some  reliable  method  of  deciding  when  and  where  to  decompose.’ 

J.  H.  Wilkinson 

(from  ‘The  Algebraic  Eigenvalue  Problem’,  1965, 

Chapter  5,  p.  330.) 

This  quotation  shows  that  Wilkinson  considered  the  possibility  of  using 
disjoint  submatrices  to  obtain  orthogonal  basis  vectors.  Our  investigations 
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A21 

10.746194182903393.. 

^20 

10.746194182903322.. 

Ai3 

6.0002340.. 

Ai2 

6.0002175.. 

A3  1 

0.9475.. 

A2 

0.2538.. 

Ai 

-1.1254.. 

Table  1:  Selected  Eigenvalues  of  W21 


show  that  this  idea  only  works  in  extreme,  and  easy,  cases.  What  we  illustrate 
here  is  that  the  submatrices  must  be  allowed  to  overlap. 

In  a  nice  recent  study,  see  [7],  Ye  has  turned  Wilkinson’s  idea  into  some 
theorems  relating  a  pair  of  eigenvalues  of  T  to  an  eigenvalue  of  a  certain 
submatrix  and  an  eigenvalue  of  the  complementary  submatrix.  However  he 
does  not  give  results  on  the  related  eigenvectors  and  that  is  our  concern. 

The  largest  pair:  A20  and  A21  (equal  to  single  precision  at  value  A) 

We  drop  the  last  2  rows  and  approximate  the  Ritz  vector  by  solving 

(H/i='9-A/)z+  =  ei7i,  ||2+1|  =  1. 

We  drop  the  first  2  rows  and  approximate  the  Ritz  vector  2_  by  solving 
(Iy":'l-A/)Z_  =  621721,  ||2-11  =  1- 

We  insert  zero  entries  to  make  z_  and  z+  conform  to  W.  It  turns  out  that 
z+  .  z_  =  1.22  X  10"'^  !,  71  =  -8.33  x  10-^  721  =  -8.33  x  10"^ 


Thus  z_,  and  z+  pass  both  requirements  for  good  eigenvectors,  orthogonal¬ 
ity  and  small  residuals.  However  an  equally  valid  basis  for  the  dominant 
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invariant  2-space  is  {(«+  ~  *-)/V2?  (*+  +  bisectors.  It  is 

gratifying  that  this  computed  basis  delivers  the  exact  eigenvectors  correct  to 
single  precision,  even  the  smallest  entries. 

The  results  from  dropping  three  rows  instead  of  two  were  indistinguish¬ 
able  from  the  ones  above.  In  fact  this  case  is  so  easy  that  one  can  use 
submatrices  (1  :  j)  and  (22  -  j  :  21)  for  j  =  14,..., 20,  for  satisfactory 
results. 

The  pair  near  6:  and  A13 

The  shift  invariant  measure  of  a  symmetric  matrix  is  its  spread  defined  as 
Amai  -  Amm-  In  this  case 

spread{W)  =  A21  -  Ai  =  11.87, 
interval  width  =  A13  —  A12  =  11.66  •  c  •  spread. 

The  two  best  choices  are  (1  :  15),  (7  :  21)  and  (1  ;  17),  (5  :  21).  The  outputs 
from  the  two  choices  are  are  barely  distinguishable  in  single  precision.  The 
Ritz  value  p  for  (1  :  17)  and  (5  :  21)  is  not  exactly  at  the  mean  but,  in  single 
precision,  appears  to  be  so: 

p  =  6.000226. 

Approximate  the  corresponding  Ritz  vectors  by  solving 

-  pl)z+  =  6676,  -  pl)z.  =  615715 

in  single  precision  to  find 

||(Vl/ni7  _  _  ^/)2;_||  =  1.4  X  10"^  =  10  •  e  •  spread 


and 

z+  ■  z-  =  2.1  X  10“®  =  8.4  •  n  •  e. 

Section  9  shows  how  the  subspace  span  {z_,z+}  is  slightly  more  accurate 
than  indicated  by  the  residual  norms  for  z_  and  z.).  separately. 

In  several  ways  (z—,  z.j.J  is  a  satisfactory  basis  but  the  pair  of  bisectors 
1(2^  _  z_)/-v/2,  (z+  -f  z_)/\/2}  is  even  better.  Just  as  for  the  pair  A20,  A21 
the  computed  bisectors  deliver  the  exact  eigenvectors  to  single  precision,  even 
the  smallest  entries  were  correct  to  6  decimals.  Figure  1  shows  z_  and 
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z+  while  Figure  2  shows  the  bisectors.  An  unorthodox  representation  has 
been  used  in  order  to  emphasize  the  small  entries.  Instead  of  v(i)  we  plot 
(—1/  logic  |u(i)|)  sign(u(i))  and  the  vector  is  normalized  so  that  the  maximum 
value  is  1.  To  show  the  distortion  we  also  plot  (2+  +  2_)/\/2  in  the  standard 
manner  at  the  top  of  Figure  2. 

A  careful  look  at  2_  and  shows  that  Wilkinson’s  idea  of  using  dis¬ 
joint  submatrices  could  not  be  satisfactory;  entries  2+(12), . . . ,  2-i-(15)  and 
2_(5), . . . ,  2-(8)  are  not  negligible.  However  {z.,  z+}  is  the  most  sparse  ba¬ 
sis  of  the  invariant  subspace  for  A12,  A13.  So  there  is  no  basis  in  which  half  the 
entries  in  each  vector  are  negligible,  in  contrast  to  the  case  for  A205A21.  The 
overlap  of  the  submatrices  is  needed  to  obtain  the  middle  entries  accurately. 

These  remarks  do  not  contradict  the  fact  the  Wilkinson  constructed 
so  that  each  pair  of  eigenvectors  could  be  built  out  of  two  small  vectors  u 
and  V.  Here 


^  0,  (W'’"  -  Ai3/)t^  =  0. 

Where  which  is  not  symmetric,  differs  from  by  replacing  entry 

(11,10)  by  2.  The  eigenvectors  for  A12  and  A13  are 

(u(l),...,u(10),  0,  -u(10),...,-u(l))‘, 

(n(l),...,n(10),  n(ll),  n(10), . . . ,  t;(l))'. 

Wilkinson’s  idea  of  using  (1:11)  and  (12:21)  (with  /3n  =  0)  will  not  work  here 
because  u{7  :  10)  ^  v{7  :  10)  to  working  accuracy  in  contrast  to  the  case  of 
A20)  A21. 

Inverse  iteration,  even  with  well  chosen  right  hand  sides,  gives  poor  re¬ 
sults  unless  the  Gram-Schmidt  process  is  used  heavily.  The  output  from  the 
LAPACK  code  sstein  was  not  as  accurate  as  our  bisectors. 

The  following  sections  show  that  well  chosen  submatrices  yield  good  bases 
in  all  cases  when  the  cluster  is  reasonably  isolated  from  the  remaining  spec¬ 
trum. 
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6  The  Envelope  of  the  Invariant  Subspace 

Let  be  a  set  of  eigenvalues  of  T  well  separated  from 

the  rest  of  the  spectrum.  Let 

Tzi  =  ZtAj,  l|'2jl|2  1) 

and  define  X  :=  [A/,A/+to-i]  and 

Sx  =  span{zi,. . . 

The  envelope  vector  of  Sj  is  Ej  given  by 

E{j)  =  Sx{j)  :=  max{u(i)  :  v  e  Sj,  ||t;||2  =  !}• 

The  extremal  vectors  in  Sx  are  characterized  by 

=  Iiy^'^ll2  =  l,  i  =  landn. 

We  consider  the  case  m  =  2  (close  pairs)  here  although  some  of  the  results 
may  be  extended  to  larger  clusters.  The  subvectors  that  are  useful  in  com¬ 
puting  an  orthogonal  basis  for  Sx  may  be  understood  as  approximations  to 
these  extremal  vectors.  A  little  more  notation  is  needed  before  the  results  of 
this  section  can  be  described.  By  Lemma  2  (proved  below) 

^(1)  =  (ziilf  + 


and  may  be  expressed  as 

=  zi  cos  ip  +  zi+ism(f,  ta,n  p  = , 

and  p  plays  an  important  role  in  this  section.  We  may  assume  that  z;(l)  >  0, 
2:/+i(l)  >  0.  The  Rayleigh  quotient  of  y^^'>  is 

=  3/(^1  •  Ty^^^  =  A/  cos^  p  +  A;+i  sin^  p.  (12) 

The  residual  of  y^^^  and  the  residual  norm  are 


=  2/cosv5(A;  -  pi)  +  ii+i  sin(^(A;+i  -  pi), 

1/2  :=  ||rj,(i)  _y(i)^i||2  =  (Ai-pi)2cosV  +  (^i+i sinV 

=  (|sin2y'^'^‘^~^'y ,  by  (12). 


(13) 
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defining  the  residual  norm  j/j  which  occurs  throughout  this  section. 

Our  goal  is  to  find  an  index  1  <  ;  <  n,  such  that  the  eigenvector 
j'ltjgi-.j  =  6^'-^  €  J,  satisfies  ^j\s^'-^{j)\  =  0(A/+i-A/).  The  connection 

of  such  j  to  the  envelope  vector  €  is  given  by  the  principal  results  of  this 
section  which  we  summarize  first. 

For  each  j  such  that  has  an  eigenvalue  6^'^  €  Z,  Theorem  1  shows 
that 

^3  +  1)  » 

provided  that  (A(+i  -  Xi)lgap  <  1,  where 

gap  =  min{A/+2  —  6^'^,  9^'^  —  A/_i}.  (14) 

For  W21  and  the  dominant  pair  A20,  A21  one  finds  5(1)  =  5(21)  Rs  0.78  and 
5(2)  =  5(20)  «  0.58.  Thus  j  =  19  or  20  is  a  good  choice. 

If  does  not  have  an  eigenvalue  in  J  then,  by  Lemma  6,  has  an 

eigenvalue  in  J  and 


(sin  2V-  •  ^')  +  -  Pnf 


sin  29? 


A/+1  —  A; 


^  (e'--’ -  Pif 


*7  - 


I'U 


where  2/^"^  =  z/  cos^’  —  -^z+i  sint/j,  /?„  =  •  ryf"^ 

Note  that  ||5||2  =  \/5  and  the  average  value  of  5’s  entries  is  yj\.  Theo¬ 
rem  1  tells  us  to  locate  5’s  maximal  entries  to  get  small  values  of  ls^’-'(j)|. 


It  turns  out  that  the  vector 


gl:i 

0 


,  for  suitable  j,  is  a  cheap  approximation 


to  Throughout  this  section  we  abbrevaiate  by  Xj-  For  any  j  that 
keeps  -  pi\/{Xi+i  -  A;)  small,  Lemma  8  says 


+  1) 

whereas 


+  0 


s^-’^(i-l-l)  = 


0,  i  >  j- 
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Note  that  6^'^  only  occurs  in  the  second  (and  small)  term  in  +  1),  not 
the  dominant  first  term.  The  difference  between  and  depends  on  the 
two  ratios,  (A/+i  —  Xi)fgap  and  \d^'^  —  /9i|/(A/+i  —  A/).  In  the  same  vein  we 
show  (following  Lemma  7)  that,  for  the  appropriate  j  values. 


s(^)(l)  =  5(1) 


1  +  0 


( *^<+1  ~  A/  \ 
V^ap(0^=-')/J  ’ 


if  5(1)  is  not  too  small. 


where  ~  min  over  all  eigenvalues  0  of  T  ^  other  than  0 

and  is  slightly  different  from  gap  defined  in  (14). 

Indeed  very  good  approximations  to  the  eigenvectors  z;  and  z/+i  are  given 
by  the  internal  and  the  external  bisectors  of  and  The  extremal 

vectors  and  are  not  quite  orthogonal: 


j^(l)  .  y(«)  ft.  (A/+1  -  A;)  ^  (A;  -  A,)  L 


Proofs 

To  establish  all  these  results  in  a  simple  way  a  sequence  of  lemmata  will 
prove  useful.  Recall  that  Sj  =  span{zi, . . . ,  z/+m-i}- 

Lemma  2 

eur  =  E 

i=i 


Proof: 


S{jY  = 


=  m^x  I  ^  '  IItII  =  1 

(/+m  — 1  /-fm— 1  \  /+m  — 1 

,=/  i=i  )  i=i 


Equality  is  attained  when  7  is  a  multiple  of  (z/(i), . .  • ,  z/+m-i(i))- 
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Corollary  1 


n  /+m— 1 


/+m— 1  n 


/+m— 1 


Ifr  =  i:  E  E  E-‘0?=  E  ' 


j=l  t=/ 


t=/  j=l 


Lemma  3  With  pi  =  =  A/  cos^  +  A/^>i  sin^  then  for  any 

=  (A/ -0^cosV+ -O^sinV 
=  ^l  +  {^-Pi)\ 

.  /A/+i-A/\ 


Proof. 


i/i  =  sin  2y? 


(A/  -  /9i  +  pi  -  0^  cos^  V?  +  (A/+1  -  pi+  pi-  iY  sin^  ip 

vl  +  2(/9i  —  0[A/  cos^  p  +  Az+i  sin^  <^  —  Pi]  +  (pi  —  0^ 
^?  +  (Pi-0^ 


Lemma  4  for  any  polynomial  x  degree  d  with  an  isolated  zero  6 
(r\  !r  n\  VpJi  i  ^  (9(^pW[^)\  ,  p  f  ^  ^ 

xK)  =  (f  -  «)X  (0)  1  +  2^  ■  ^.(0—)  +  ° 


where  gap{9)  equals  to  the  separation  of  6  from  the  rest  of  x’s  zeros.  Also 


\9ap{e)x"{9) 


<  2{d-l). 


Proof.  A  Taylor  series  expansion  of  x  around  6  yields 

x(0  =  0  +  ({ -  e)x'm  +  5(«  -  efx'W  +  i({  -  e?x’"M 
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for  some  t]  in  the  convex  hull  of  ^  and  6.  There  is  a  simple  expression 
for  x"{^)/x'{^)  if  we  denote  the  zeros  of  x  f>y  and  0  =  9j.  By 

logarithmic  differentiation, 


x"m 

X'{Gj) 


whence 


gap{0j)x"{&j) 

xW 


-  9j±i 

^2^  Q.-Ok 


<  2(d-  1). 


□ 


Remark.  For  evenly  spaced  zeros  the  upper  bound  is  2  ln(d—  1)  not  2(d  1). 


Lemma  5  For  each  j,  I  <  j  <  n,  either  x^'^  ^  has  a  zero  in  [A/,A/+i]  or 
^i+i:n  [A/,A;+i]. 

Proof.  The  characteristic  polynomial  of  the  submatrix  of  T  obtained  by  delet¬ 
ing  row  and  column  j  is  Cauchy  s  interlace  theorem 

one  of  these  polynomials  (at  least)  has  a  zero  in  [A^,  Ai^-i]  for  i  =  1, . . . ,  n  —  1. 
It  can  be  shown  that  if  (and  only  if)  both  polynomials  have  a  zero  in  [A,-,  A,+i] 
then  it  is  either  A,-  or  A,+i . 

Now  we  can  establish  the  principal  result  of  this  section. 


Theorem  1  With  the  notation  developed  in  this  section,  if  Xj  has  a  zero  0^-^ 
in  (A/,  )  then 


+ 1) 


£(i) 


-|l/2 


/  A/+1  -  A  A  1 

\gap{e^--^)  J  j 


Proof  By  Lemma  2  and  (4)  in  Section  3 


SU  +  lf  =  z,{j  +  lf  +  z,.,r{j  +  l)\ 

=  [2/(l)^Xi('^;)^  +  /(/^i '  "  • 
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Square  Lemma  4,  for  =  Xi  and  its  zero 

[/j,...A£(i  +  i)f  =  -«"')' [i + 0 

+  x;(«“')V.(i)U+. -o"')Mi  +  o 

and 

^(i  + 1)  = 

.  I  [z,(I)“(A,  -  e'^’f  +  2,+i(lf  ( V.  - 

By  Lemma  3, 


\gap{e^-^)J 
A;+i  -  0^--^ 


gap{0 


0l:i\ 
1.)  ) 


l  +  o( 

'  ^  ^  Kgapie^-^)  J 

To  relate  £{j  +  1)  to  the  submatrix  T^'-^  and  eigenvector  s^=-’,  recall  (6) 
in  Section  3 

Substitute  this  expression  into  the  equation  above  and  cancel  the  nonzero 
derivative  to  obtain 


\gap{0^'^))\ 


as  claimed. 


Lemma  6  For  all  j  such  that  +  1)  is  nearly  maximal  and  Xj  has  a  zero 
in  [A;,A/+i] 


s^^\lY  = 


eu  + 1) 


1  +  0 


A;+i  —  Af\ 
gapie^-^)] 


gap{j)  =  min{Ai+2  -  -  A/-i),  r  =  t{0^'-^)  is  given  in  (15). 


Proof.  To  analyze  in  terms  of  T  rather  than  note  that 
(T  -  ^  ^  =  ej+il3js^^\j). 

Use  T  =  ZliZ^  to  obtain 

^  ^  =  Z  (A  -  ZUj+il3jS^^\j) 


and 


3(a(i)  =  £ 

1 _ 1  ^ 


k-l 


is  taken  positive.  The  terms  corresponding  to  k  =  l,l  +  I  usually  dominate 
the  sum  and  it  may  be  written 


and 


,0)(i)  =  ft.wy)  (15) 

Zk{l)Zkij  +  I) 


=  E 


kjil,l+l 


Xk  -  0^-^  ■ 


The  key  fact  is  that  the  first  two  terms  in  (1 5)  combine  to  give  5(l)E  Use 
(15)  to  obtain 


g;(l)'gKi  +  I)  +  I)  _ 

Xi  -  A,+i  - 


l  +  O 


l  +  O 


/  ■^i+i  ~  Xi 


V5ap(6>i-'j)yJ 


- 


( ~ 

\gap{6^-^) 


(16) 


Now  (6)  and  Theorem  I  give 

=  |«w(i).wu)ftl  =  («"')  ■ 


l  +  O 


/  Az+i  —  A/\ 
I  _ ) 


cr  • 
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Multiply  (15)  by  and  then  use  (16)  to  find 

{  ^i+i  ~ 


s^^\iY  =  e{i) 


/^1  •  •  •  l^j-1 
Now  use  (17)  to  simplify  both  terms 


1  +  0 


\gap{9^'-^] 


=  5(1)' 


1  +  0  ( 


5(;  + 1) 


1  +  0 


£(!)'  + 5(1)1/ 


•sign  {s^^\j)) 

5(i  +  l) 


f  hULZ-^ 

\gap{e'^--^)^ 

i  +  o( 


For  well  chosen  j  the  first  term  5(1)^  usually  dominates  the  second.  When 
5(1)  is  small  then  j  must  be  large  to  ensure  that  lies  in  [A/,A/+i]  and  it 
is  possible  that  5(1)^  always  dominates  but  we  have  not  proved  that  yet. 
Note  that 

(i)  V  («'/)  ■  kl  <  Emw+i  +  >)l "  /«“?(/)  <  " 

is  assumed  to  be  very  small. 

(ii)  5(i  +  1)  >  (above  average,  by  choice  of  j). 

(iii)  By  Lemma  7,  (i)  and  (ii),  whenever 

5(l)5(i  +  1)  >  21/  lgap{j) 


then 


3(^)(1)  =  5(1) 


(  i/Tsign  (s(^)(i))^ 
5(1)5(;  +  1)  J 


1  +  0 


/Ahi-A/V 

\gap{e^-^)  J  _ 


i.  e.  s^'^^(l)  «  5(1). 
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In  Figures  3  and  4  we  show  typical  envelopes. 

Figure  3  shows  snapshots  of  the  envelope  of  a  cluster  of  108  close  eigen¬ 
values,  each  differing  from  its  neighbors  by  0(e  •  spread),  from  a  matrix  of 
order  n  =  2053  supplied  by  George  Fann,  Pacific  Northwest  Laboratory, 
Richland,  WA.  The  matrix  arose  in  the  application  of  the  self-consistent  field 
Hartree-Fock  method  for  solving  the  nonlinear  Schroedinger  equation  for  Ze¬ 
olite  ZSM-5.  The  top  shows  the  first  50  entries  in  £  and  the  bottom  shows 
entries  470-550.  The  cluster  is  completely  determined  by  submatrix  (1:515). 
The  supports  of  all  the  eigenvectors  belonging  to  this  cluster  lie  in  (1:515). 

Figure  4  shows  the  envelope  of  span{zi2, 2:13)  for  see  Section  5.  The 
eigenvalues  are  close  to  6. 

The  ‘humps’  in  most  envelopes  are  unimodal  but  this  case  is  an  exception. 
As  shown  in  Section  5  either  peak  (4  or  6,  16  or  18)  will  serve  for  choosing  a 
submatrix.  However  it  is  vital  to  realize  that  indices  4  and  6  belong  to  the 
same  hump. 


Extremal  Vectors  in  Sj 

Lemma  7  For  any  9  €  (A/,  \i+i) 

(^1  •  •  •  =  Xiipi)  +  \^hiW  +  0{{>^i+i- ^i)^)  , 

Pi  = 

.  r,  +  l 

1/1  =  sm  2<p - - - . 

Proof.  From  the  beginning  of  this  section 

=  z/ cos<p -f  Z/+1  sin<p,  (p€[0, 7r/2], 


By  (4)  in  Section  3 


-I-  1)  = 


^/(l)Xi(Ai)  2/-i-i(l)Xt(A;+i) 

A-ft 


sin  ip. 


Since  cosip  =  zi{l)/£{l)  and  5(1)  = 

+  1)  ^  x.(Ai)  cos^  +  x.(A;4.i)  sin^ 

i/(i)(l) 


22 


Expand  x«  s.bout  any  0  €  (A/,  A/^-i)  and  use  Lemma  3  to  find 

Xt(Ai)  cos^  +  xK'^z+i)  ~  (^jip  +  sin  (/?) 

+  xK^)[cos^  ^  -  ^)  +  sin^  ^  (A/+1  -  0)] 

+  \x'!W‘'  («)'  +  o  ((A«.  -  A,)’) . 

The  coefficient  of  xK^)  is  just  pi-9  and  part  of  u  {9f  is  (pi  -  9)^.  Thus  the 
right  side  simplifies  to 

X.(ft)  +  J-'fx"(«)  +  0((A,+, -A,f). 

as  claimed. 


For  comparison 


(/?!••  •  Pi) 


+  1) 
sO)(l) 


X,(6>^'^),  i  <  j, 
0,  i  >  j- 


The  closer  is  9'^'-^  to  pi  the  closer  is  ^  ^  j  to  Let  us  consider  the 

magnitude  of  d"  1)  which  we  approximate  by  0.  By  (6)  in  Section  3 


13,- --Pj  sy){l)sU){jW 


(18) 


Hence,  with  9  =  9^'-^  in  Lemma  7, 

+  ~ 

J/(1)(1)  ”  ^L)(l)sL)(j)^, 

2gap{9^-^)'  Pi---Pi  '  x'(^'=^') 

+  higher  order  terms. 

By  Theorem  1,  for  the  best  values  of  j, 

+  1)  w  (0^'-')  =  [v]  4-  {pi  -  9^-^f]  ^  . 
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Recall  that  y(^)(l)  =  5(1)  and  5(1)  <  ^(^^(l)  (but  «  if  5(1)  is  not  small). 
Thus 

r Pi  - 


y(l)(j  +  1)  = 


1 

+ 


S(j  + 1) 

I'l 


2  gap{9^'^) 


■  S{j  +  1)M2 


5(1) 

+  higher  order  terms] 


<  ^(i  +  i) 


p,  -  01b  1 


1^1 


M, 


where 


Mi  = 


i/(0ib)  "^2  gap{9^'-^)  ^ 

<  2(i  -  1). 


+ 


9ap{9^'-^)x'ji^^''^) 

The  relation  (19)  tells  us  that  it  is  essential  that  both  ratios 


Pi  -  9^-^ 


and 


(A/+1  -  A/)sin2(^ 


r/(0ib)  yap(0ib)  2gap{9^-^) 


(19) 


be  small. 

Next  consider  Recall  that 

y<i)  =  zi  cos  <fi  +  zi+i  sin  (f,  cos  ip  -  z;+i  sin  ip, 

where  tp  €  (0,x/2).  This  choice  of  sign  for  y^"^  is  deliberate  and  yields 
Ip  e  (0, 7r/2).  Since  Z/  •  Z/+i  =  0 

y(i) .  y(")  =  COS  Ip  cos  —  sin  «/>  sin  =  cos{ip  +  V?)- 


By  (6)  in  Section  3, 

zi{l)ziin)x'{>^t)  =  •  •  •  l^n-i  =  2/+i(l)2/+i(«)x'(^/+i) 

where  x  =  X^'"-  Hence 


tan  Ip 


-zn-ijn)  _  -z;(l)  _  X'(A/) 
zi{n)  2:/+i(l)  X'i^l+i) 


1  (A;  —  A;+i)  ~  ^«) 

tan  if  (A/+1  -  Ai) 
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where  H"  =  n”=i,./;,/+r  So 


=  n  ('+^) 


1  +  ('^;+i  ~  ^  ]  {^i  ~  "h  ^  (('^i+i  ) 


1  —  tan  (p  •  tanip  =  0 


A/+1  —  A/ 


gap  =  min  |A, •  — A/1,  j  /,  /  +  1. 


This  establishes 

Lemma  8 


yW  .  y(”)  =  COS  xp  COS  (1  —  tan  p  •  tanxp)  =  0  ^ ^ ^  • 

7  The  Submatrix  Theorems 

We  consider  the  case  when  a  pair  of  adjacent  eigenvalues,  call  them  A_  and 
A+  are  much  closer  together  than  they  are  to  any  other  eigenvalues. 

We  claim  that  there  is  a  submatrix  ^ith  two  properties 

(a)  It  has  a  well  isolated  eigenvalue  B  in  (A_,A+). 

(b)  The  normalized  eigenvector  s  for  B  has  the  property  that  ^  ^  ^  is  very 

close  to  the  invariant  subspace  (under  T)  associated  with  A_  and  A+. 

If  we  apply  this  property  to  the  trailing  submatrices  for  some  k,  we 
obtain  another  vector  f  ^  ^  that  is  also  very  close  to  the  desired  invariant 


subspace  for  A±.  Thus  ^  q  j  ^  excellent  basis  for  this 

subspace. 

The  smaller  is  A+  —  A_,  the  smaller  is  the  dot  product  ^  0  )  ' 

We  now  prepare  to  make  precise  the  preceeding  claims. 
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Paige’s  Persistence  theorem  (in  Section  3)  says  that  there  is  at  least  one 
Ritz  value  in  the  Ritz  interval 


r(i) 


(i) 


Wi  =  wj-’’  = 


(20) 


for  alH  =  1, 2, . . . ,  n  -  j.  When  0jUJi  is  relatively  large,  as  happens  when  j 
is  small,  then  may  contain  several  eigenvalues.  However  as  u>i  decreases 
then  becomes  disjoint  from  all  other  and,  as  j  increases,  shrinks  onto 
a  single  eigenvalue. 

Our  interest  now  centers  on  those  rare  cases  when  2-^  is  isolated  from  its 
neighbors  and  small  and  yet  2  eigenvalues  get  into  How  close  can  they 
be  to  each  other?  We  will  show  that  ^jU>i  acts  as  a  barrier  in  the  sense  that 
the  two  values,  A_  and  A+,  satisfy 


A+  -  A_  >  OiM. 


So,  if  A+  -  A_  is  tiny  then  the  vector 


0 


has  a  small  residual  norm 


since 


More  notation  is  needed  to  state  Theorem  2. 


Secular  Equation 

For  some  j  <  n,  write 

/  .  \ 

I  •  Oj+i  •  ,  •  =  13 j  or  pj+i. 

\  •  / 

If  C  is  not  an  eigenvalue  of  nor  of  7’''+^  "  then 

det{T-CI)  =  det{T^'-^ -CI)[a,+i-C- -Ciyhj 

-  Cl]- 

Consequently  the  eigenvalues  of  T  that  are  not  eigenvalues  of  T^'-^  nor  of 
7i+2;n  must  satisfy  the  nonlinear  secular  equation 

^(A)  :=  -  A  -  -  Xiy^ej  -  -  A/)-^ea  =  0. 


The  middle  term  will  be  written  as 


-  ar'ei  =  4p-  + 

-  c 

where  w/  =  =  l>sp^(j)|  and  £  (A_,  A+),  and 

W'’(0:=/JJElr4 

»#/  ^ 

and  the  final  term  as 

l3]+ie{{Tj+2,n  -  C/)"^ei  =  Tj+2,„(C)-  (22) 

The  following  results  bound  the  residual  norm  in  terms  of  the  sepa¬ 

ration  of  from  the  other  zeros  of  Xj-  Sometimes,  these  gaps  are  greater 
than  separation  of  from  eigenvalues  A  other  than  A±. 

Theorem  2  (Double  Occupancy)  Let  r„  denote  a  symmetric  unreduced 
tridiagonal  matrix.  Let  A_  and  A+  be  two  adjacent  eigenvalues.  The  notation 
developed  in  this  section  is  in  force.  Consider  those  indices  j ,  1  <  j  <  n, 
that  satisfy  the  hypothesis 

(H)  There  is  a  single  Ritz  value  in  the  open  interval  (A_,  A+)  and 
neither  end  point  is  a  Ritz  value. 

For  such  j  test  the  condition  for  double  occupancy  of  in  terms  of  (21) 
and  (22): 

(VO)  Tj^.2,n(A_)  <  CCj+i  —  -f 

If  (VO)  does  not  hold  (only  one  eigenvalue  in  )  then 
A+  —  A_  > 

If  (VO )  holds  (both  eigenvalues  in  T[^^)  then 
>  A+  -  A_  > 


where 
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Remark  1  Theorem  2  does  not  give  lower  bounds  on  A+  -  A_  because  G  is 
a  function  of  A-  and  A+. 

Remark  2  G,  =  \  M  +  r^+2,ni^2)]  for  some  values  r/i  and  m  in 

(A_,A+).  Moreover  and  Tj+2,n  are  rational  functions  that  are  inono- 
tonic  increasing  between  poles.  By  (H)  (A_,  A+)  is  between  poles  of  V’/  and 
also  (A_,A+)  is  between  poles  of  Tj+2,n-  Hence  Gj  >  0  on  (A_,A+).  Our 
interest  is  in  those  j  for  which  Gj  is  smallest. 

Proof  {oi  Theorem  2).  Partition  T  -  (I  into  a  3  x  3  structure  with  blocks 
1  :  i,i  + 1  :  i  +  1.  J  +  2  :  n  and  then  form  the  Schur  complement  o-(C)  of  entry 
(j  +  j  -t-  1).  Thus  T  -  (I  is  congruent  to 

T^’-^  -  (7  ®  <7(0  ®  -  C7 

where 

<r(C)  =  art,  - 

(T  =  0  is  sometimes  called  the  secular  equation. 

Observe  that  0  is  the  submatrix  of  T  obtained  by  deleting 

row  and  column  j  +  1.  By  Cauchy’s  Interlace  theorem  the  Ritz  values  from 
and  interlace  the  eigenvalues  of  T.  By  (H)  (A_,  A+)  contains 

and  so  cannot  contain  any  Ritz  value  of  By  Theorem  5  in  [5]  A±  can 

only  be  Ritz  values  if  both  and  a  Ritz  value  of  coincide  at  A±. 

Thus  (H)  rules  out  this  possibility  and  so 

[A_,  A+],  the  closed  interval,  contains  no  Ritz  values  of  ^23) 

The  set  of  indices  j  that  satisfy  (H)  includes  j  =  n  -  1,  by  Cauchy’s  Interlace 
theorem  and  so  is  nonempty.  When  j  is  too  small  then  it  is  a  Ritz  value  of 
that  lies  in  (A_,A+)  and  not 

Let  1  <  f  <  i  denote  the  Ritz  value  in  (A_,  A+)  and  separate  it  from 
the  other  j-level  Ritz  values.  Thus 

-  Or'ei  = 


28 


where  is  defined  by  (21). 

Recall  that  . .  ,s\^\j))  is  a  normalized  Ritz  vector  for  and 

u>i  =  =  l-sP(j)|.  The  final  term  in  cr(C)  is  Tj+2,n(C)  i®  defined  in  (22). 

By  (H)  A_  and  A+  are  not  eigenvalues  of  Tj.  By  (23)  they  are  not  eigen¬ 
values  of  Tj+2,n-  Consequently  A_  and  A+  must  satisfy  the  secular  equation 

/32  0)^ 

<t(A±)  :=  Oj+I  —  A±  —  -T^  -  —  Tj+2,n{^±)  =  0. 

0]  —  x± 

To  simplify  the  analysis  that  follows  write 

9  =  d\^\  U>  =  ul^\  0  =  1^3,  a  =  Qj+i,  =  r  =  Tj+2,n. 

Next  rewrite  the  above  equations  in  terms  of  positive  terms  A+  —  ^  and  6  —  \- 
to  find 

(A+  -  Oy  +  2E{X+  -9)  =  /?V,  (24) 

{9-X.Y  +  2F{9-X.)  =  I3W,  (25) 

where 

E  =  E{X+)  =  [^,(A+)  r(A+)  9  -  q]/2, 

F  =  F(A_)  ^[a-9-  ^;(A_)  -  t(A_)]/2, 

and 

E  +  F  =  [^/(A+)  —  '0/(A_)  +  t(A+)  —  r(A_)]/2. 

By  Remark  2  both  ipi  and  t  are  monotone  increasing  in  (A_,A+)  and  so 

E  +  F>0. 

The  quadratic  -|-  2Ex  -  =  0  has  two  real  roots  whose  product  is 

First  we  establish  the  case  when  {VO)  does  not  hold. 

If  £'  <  0  then  the  positive  root  is  the  larger  (in  magnitude)  and  must 
exceed  ^u3.  Hence 

X+-9>  9-X.>0 

together  imply 

A+  —  A_  > 
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Similarly,  if  F  <  0,  then  ^  -  A_  >  A+  -  0  >  0,  and,  again 

A+  -  A_  >  I3u), 

as  claimed.  Differences  smaller  than  can  occur  only  when  both  E  = 
F(A+)  >  0  and  F  =  F(A_)  >  0;  that  is  Condition  {VO)  holds.  This  is  the 
case  considered  in  the  remaining  analysis. 

From  (24)  and  (25) 

\+-e  =  +  F2  +  F},  (26) 

e-X.=^W/Wl3^uj^  +  F^  +  F}. 


Thus 


A+  -  A_ 


+ 


^2  +  E  ^/Wu?TF^  +  F 

To  simplify  this  equation  note  that  the  function 


(27) 


/ -  \/a:^  +  /? 

/(I)  :=  {X  + 


—  X 


is  monotone  decreasing  and  concave  upward  >  0)  for  x  >  0.  By  concavity 
and  the  positivity  of  E  and  F, 


f{E)  +  f(F) 


2 


F  +  F 


in  other  words,  from  (27), 

A+  -  A-  /(F)  +  /(F) 

2l3^uj^  2 


>/ 


F  +  F 
2 


(28) 


To  simplify  this  relation  write 

^  ^  ^  =  -  [t^i(^+)  ~  +  2'('^+)  ~ 

■(^)» 


where  G  =  G(A_,  A^-)  is  given  in  the  statement  of  the  theorem. 
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Let  r  =  and  multiply  (28)  by  I3u  to  find 

_ 

"  G2  +  G 


=  x/1  +  {rOy  -  rG. 

Add  rG  to  each  side,  square  and  subtract  r^G^  to  find 

r2(l  +2G)  >  1 

as  claimed.  ^ 

Remark  3  In  the  cases  of  interest  to  us  /?co  is  small  and 

g«5WW  +  tW). 

The  next  task  is  to  turn  the  inequalities  of  Theorem  2  into  lower  bounds  on 
—  A_  by  obtaining  upper  bounds  on  G.  These  bounds  depend  on  the  gap 
between  and  the  other  Ritz  values  of  and 

Theorem  3  Let  A_  and  X+  be  two  consecutive  eigenvalues  of  a  symmetric 
unreduced  tridiagonal  matrix  Tn-  The  notation  developed  in  this  section  is  in 
force.  For  each  j,  1  <  j  <  n,  that  satisfies  the  two  hypotheses 

(H)  There  is  a  single  Ritz  value  in  the  open  interval  (A_,  A+) 
(GAP)  <  gap{l),  where 

gafil)  =  min  {««  -  |«1'’  -  i  =  l.n  -  j  -  l}, 

*^1  =  M!'’0)I 

then, 

either  (A_,  A^.)  ^  := 

in  which  case  A^.  --  A_  >  /SjuJi 

or{\.,X+)Cli^\ 

in  which  case  ,  A^_  —  A_  >  2/?ju;// +  2rj; 
where 

Tj  =  {/3]{1  -  uf)  +  (3]^i)/2[gap{l)  -  0juJi]\ 


_ 1 _ 

Jl  +  {rGy  +  rG' 
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Remark  4  The  observations  at  the  end  of  Section  3  say  that,  on  average, 
Fj  is  0(1)/  the  geometric  mean  of  the  is  less  than  the  geometric  mean  of 
the  gaps  \6i  —  ^j|,  i  ^  1. 

Proof  It  is  only  necessary  to  bound  the  quantity  G  from  Theorem  2  and  the 
same  notation  will  be  adopted. 

For  some  r]  €  (A_,A+), 


(29) 


where  is  a  weighted  harmonic  mean  of  —  itY}-  Since 

the  minimum  value  never  exceeds  any  mean 


n 


■U) 


>  min(#-7/)^ 

oO) 


min{(0j^\  -  ?/)^  -  vf] 


There  are  two  cases.  If  (A_,  A+)  ^  then  A^.  —  A_  >  0jU^i  for  the  same 
reasons  given  in  Theorem  2,  In  the  contrary  case,  one  has 

,  e  |A_,  A+)  c  l!’K 

Now  (GAP)  guarantees  that 

min{0|:^\  -  »?,  >  min{0/+i  -  6i,0i  -  0t-i}  -  /3uj  >  /9a;. 

Similarly,  for  some  (  €  (A_,  A+) 


7'j+2,n(A+)  -  Tj+2,n(A-)  _ 

A+  -  A_ 


^j+2,n(C) 


= 


„(i+2,n) 


(0 


n  2 


e: 


|(j+2.n) 


-c 


(30) 
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where  is  a  weighted  harmonic  mean  of  —  C)^} 

and  so  (GAP)  yields 

'^(i+2,n)  >  min(^p+^’"^  -  C)"* 

t 

>  (rnin|^S'+"’"^-^/|-/0w)2 

>  i3V. 

The  definition  of  gap{l)  is  made  so  that 

Wp)  >  {gap{l)  -  Pujf,  >  {gap{l)  -  Puj)\ 

and  thus,  using  (29)  and  (30), 

2G  = 

{gap{i)  - 
:=  2Tj 

and  application  of  this  inequality  into  the  second  inequality  of  Theorem  2 
establishes  Theorem  3. 


Remark  5  When  we  first  established  Theorems  2  and  3  we  were  concerned 
that  the  gap  quantities  involved  Ritz  values  9^'^  etc.  and  not  eigenvalues  Aj. 
However  the  study  of  some  challenging  examples  has  shown  that,  in  fact,  the 
gaps  involving  Ritz  values  do  lead  to  an  important  extension.  In  examples 
with  100  or  200  eigenvalues  in  a  cluster,  each  separated  from  its  neighbor 
by  about  10  or  20  ulps  (units  in  the  last  place)  combine  to  form  a  cluster 
of  nonnegligible  width  (e.  g.  1500  ulps).  We  find  that  for  each  submatrix, 
taken  with  its  nearest  neighbor,  the  gaps  of  Theorem  3  are  vastly  greater 
than  quantities  such  as  -  A/_i  and  A/4.2  -  0i^\  In  other  words.  Theorem  3 
can  be  applied,  submatrix  by  submatrix,  to  clusters  whose  total  width  is 
greater  than  0(e  •  spread). 

Recall  that  the  envelope  bound  in  Section  6  shows  that  there  are  ^-values 
for  which  our  gap  quantities  Gj  satisfy 


2Gj^ 


1 

£{j  +  1)2 
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In  fact  each  hump  in  an  envelope  contributes  1  to  on  average,  and  if 

the  hump  is  spiky  5(i  +  1)  «  1  while  if  four  consecutive  entries  contribute 
to  the  hump  then  the  greatest  of  the  four  exceeds  1/2. 

8  The  Overlap  Theorems 

We  need  expressions  for  the  entries  of  the  vectors  constructed  to  span  the 
invariant  subspace  corresponding  to  a  close  pair  of  eigenvalues  A_  and  A+ 
of  r  =  r^=".  Recall  that  x'''"(C)  is  the  characteristic  polynomial  of  the 

submatrix 

The  index  m  (not  unique)  is  determined  so  that  has  a  well  isolated 
Ritz  value  in  (A_,A+)  whose  normalized  Ritz  vector  has  a  small 
last  entry.  More  precisely 

/?„|s^=’"(m)|  =  0(A+  -  A_). 

Trailing  submatrices  are  used  to  find  a  suitable  index  /  (not  unique)  so  that 
T‘'-^  has  a  well  isolated  Ritz  value  in  (A_,A+)  whose  normalized  Ritz 
vector  s'-”  has  a  small  first  entry.  More  precisely 

^,_i|s'-(/)|  =  0(A+-A_). 


The  spanning  vectors  are 


and  the  indices  1  <  /  <  n,  1  <  m  <  n,  play  a  crucial  role  in  the  analysis.  To 
avoid  index  troubles  the  entries  in  s'-”  are  labelled  from  /  to  n,  not  from  1 
to  —  /  +  1. 

There  are  cases  in  which  m  <  I  and  then  p  and  q  are  orthogonal  because 
their  supports  are  disjoint.  So  we  turn  to  the  othei  cases  when 

1  <  I  <  m  <  n. 

and  study  b(i)?(i)|  for  I  <  j  <  m.  We  will  show  that  p  and  q  are  nearly 
orthogonal.  There  are  formulae  for  the  magnitudes  of  p(i)  and  q{i) 

PiiY  =  Xl,i-l(^)Xi+l,m(^)/Xl,m(^)’  *  — 
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where  6  =  9^'”^  is  p's  Ritz  value.  Similarly 

where  ip  =  9^'"  is  q's  Ritz  value. 

It  is  readily  verified  that  for  tridiagonal  matrices 

Xl{»>n(C)  •“  Xl.«-l(C)X<+lin(C) 

is  the  characteristic  polynomial  of  the  submatrix  of  obtained  by  deleting 
row  and  column  i.  Hence,  by  Cauchy’s  Interlace  theorem  the  Ritz  values  of 
and  together  interlace  (weakly)  the  Ritz  values  of  In 

particular,  for  each  z,  the  open  interval  (A_,A+)  contains  either  a  Ritz  value 
of  or  a  Ritz  value  of  both.  In  an  exceptional  case  the 

open  interval  is  free  of  Ritz  values  just  when  Ritz  values  of  and 
coincide  at  either  A_  or  A+  (but  not  both)  and  in  that  case  p{i)q{i)  =  0. 

We  can  simplify  some  expressions  considerably  by  the  following  conven¬ 
tion: 

If  a  Ritz  value  9f^  G  [A_,A+]  then  the  index  i  is  omitted.  Further  9^^^ 
denotes  the  smallest  Ritz  value  of  T’-’’*  exceeding  A+  and  9^_^  denotes  the 
largest  Ritz  value  ofT^'^  less  than  A_. 


We  need  a  result  from  [3]. 


Theorem  4  With  the  notation  developed  above  let  Tsi  =  SjA,-,  s*Si  =  1 
be  an  eigenvector  equation.  Let  T^^'>  denote  the  submatrix  obtained  from  T 
by  deleting  row  and  column  j;  its  spectrum  is  9^f^<...<  Then  for 

I  <  i  <  n, 

, ^  -  A.  -  A, 

Si{j)  -  - 


A,  —  Ai_i  A,.(.i  —  Xi  A,- 


+2 


A, 


The  bound  for  i  =  1  and  i  =  n  is  obtained  by  omitting  quotients  with  out-of- 
bounds  indices. 


Proof.  For  tridiagonal  matrices,  see  [4], 

'  x'(A,)  n;=.,„^.(A.-A„)' 
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By  Cauchy’s  Interlace  theorem 

Afc-i  <  e’i'l,  <\k< 


So,  for  each  k  <  i,  the  quotient  <  1  and,  for  each  k  >  i,  the  quotient 

<;  1,  As  \k  —  i|  increases  the  quotients  become  close  to  1.  By 

discarding  all  but  the  smallest  two  or  three  quotients  the  upper  bound  is 

obtained.  rri-m 

In  what  follows  we  shall  apply  Theorem  4  to  submatrices  such  as  T 

and  One  of  the  principle  concerns  in  the  theorem  proved  below  is  the 

location  of  Ritz  values  such  as  or  By  (31)  6^^  is  either  6^^ 

or  and  we  shall  be  concerned  with  both  cases.  Recall  that  only  one 

of  0+^"^  and  lies  in  (A+,A++)  and  the  other  exceeds  A++,  the  next 

eigenvalue  of  T  greater  than  A+. 

Figure  5  is  worth  contemplating  before  reading  the  Overlap  theorem.  It 
shows  the  intervals  in  which  and  9^^  will  lie  as  j  varies. 


Theorem  5  (Overlap)  Let  T  be  n  x  n,  symmetric,  unreduced,  and  tridi¬ 
agonal.  Suppose  that  adjacent  eigenvalues  A_  and  A+  of  T  are  well  enough 
separated  from  the  remaining  spectrum  to  yield  the  indices  I  and  m  and  the 
vectors  p  and  (j  described  at  the  beginning  of  the  section. 

For  each  j ,  I  <  j  <  tu, 


\p{Mi)\  <  2''- 


3/4 

+  0 


) 


where 

13  =  min{/3j_i,^j}, 
gap  =  rmn{gap{l),gap{m)}, 
gap{l)  = 

gap{m)  =  min{0'=“  - 

Proof.  First  confine  attention  to  those  j  values  such  that  €  (A_,A+). 

Consider  the  expression  for  p(i)^  in  Theorem  4  using  ™  and  extract  the 
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three  smallest  terms  in  the  product  to  find 


pU?  < 


Ql:m_0l{j)Tn 

0l-rn^0l{j)m 


$l'Tn^Ql:m 


0Vm_0l:m 


if 


otherwise. 


(32) 


Thus  the  middle  term  itself  is  bounded  by  (A+  —  X^)(gap{Tn).  A  smaller 
bound  emerges  by  considering  either  a  neighboring  term  from  p{jY  or  the 
smallest  term  in  qUY-  Without  loss  of  generality  we  suppose  that  the  closest 
Ritz  values  outside  (A_,  A+)  are  on  the  right.  By  Cauchy’s  Interlace  theorem 
the  open  interval  (A+,A++)  contains  either  or  9!^^'""  but  not  both. 

Case  1;  €  (A+,  A++). 

To  obtain  a  bound  better  than  A+  -  A_  on  -  9'^-^-'^  \  consider  6'^=^“^  as 
an  approximation  to  and  apply  the  Gap  theorem  for  Rayleigh  quotients, 
see  [4]. 

|^i:m  _  |  <  min  (  A+  -  A_ ,  - — - 1  (33) 

where 

i  =  ^  *  Q  )  ,  i^j-i  =  -  1)1. 

and 

gap{m,j  -  1)  -  min  -  9]:^} 

For  future  application  note  that 


gap{mj  -  1)  =  gap{m) 


1  +  0 


A+-A-\ 

gapim)  ) 


(34) 


since  9^'^~'^  and  0^'’”  lie  in  (A_,A+).  To  bound  ||r||  we  apply  the  Double 
Occupancy  Theorem,  proved  above,  not  to  but  to  T.  Single  occupancy 
guarantees  that  ||r||  <  A+  -  A_  and  hence 

101-  _  0i:i-i|  <  ^  [l  +  0  ~  , 

gap{Tn)  [  \9mm)J\ 
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so  that  |p(j)|  <  (A+  -  X-)fgap.  This  is  already  tighter  than  the  bound  to  be 
established.  Double  occupancy  yields  a  weaker  bound: 

l|r||'“<(l  +  2(?,_i)(A+-A_f  (35) 


where 


Gj-l  —  2 

p,  =  -(A+  +  A_). 


1  +  0 


V  gap{m) 


Moreover,  from  Remark  3  in  Section  7  and  (29),  (30), 


_ 

r’{p)  <  I^T+I&Z 


/i, 


(36) 

(37) 


In  Ca5e  1,  >  t'  and  putting  (35)  into  (33)  yields 

a+-A_  .  /  ,,  Ae-A-I  [  /At  -  A-\ 

(38) 


gap{m)  '  gap{m) 

Now  we  must  use  the  right  hand  term  in  (32),  noting  that 


0^j)m 


0l:m 


01^m  _  0V.m 


^W-1  _  0l:m 
0Um  _  01 -.m 


eT"  - 

0Vm  _  0V.m 


^  /A+  -  A_ 

*  ( Sap(m) 


■  (39) 


In  the  analysis  to  follow  we  shall  drop  the  1  from  1  +  2Gj-i  because  it 
contributes  only  a  higher  order  term  to  the  bounds.  Insert  (36)  into  (38)  and 
multiply  by  (39)  to  find 


pU?  < 

< 


Ql-.m  _  gnm  —  01:771 


(A+-A-)  .  f,  2^li  (A-h-A-)I 

gap{m)  |  -  ;z)2  gap{m.)  J 


(A+-A_)  .  f  -  fi  A+  -  A- 

gap{m)  1  gap{m)  ’  -  p  gap(my 


^/A+-A_' 

^  (  gap{m)  , 


+  0 


V  gapim)  , 


Bound  the  min  by  the  geometric  mean  to  find 

A 


( <  V2  ~  ( '^+  ~  ^  ^ 

^  gap{m)  gap{m)  \gap{m)  ) 

a+-A-M_^q|'P+-_^ 


1  +  0 


gap{m) 

Since  q{jy  <  1  the  claimed  bound  holds  in  Case  1. 


(40) 


(41) 


Case  2:  €  (A+,  A++). 

The  argument  is  similar  to  Case  1  but  now  it  is  q{jy  that  offsets  a  large 
value  for  Gj-i.  In  Case  2,  r'  >  V’'  and  so  (38)  yields 


p{j?  < 


(A+-A-) 

gap{m) 


min  <  1, 


213] 


A+  -  A_ 


+  l:n 


1  +  0 


(K  -  A- 
V  gapim) 


(K 
+  0 


-  pY  gap{'^) 
/a+-a_ 

V  ^ap(m) 


The  smallest  term  in  q^jY  gives 


g{3?  < 


Qj+\-n  _ 
0hn  _  0l:n 


gap{l) 


1  +  0 


/A^-A_ 
V  5ap(0 


(42) 


Now  take  the  product  of  (40)  and  (42)  and  bound  min  by  the  geometric  mean 
to  find 


pU?qU?  < 


A+-A- 

gap{m) 


min 


-  P  2Y]  A+  -  A_  \ 

gap{l)  ’  gap{m)gap{l)  j 


<  ^  A+  -  A- 

gap{m)  gap{l) 


+  -A- 

pap 


We  see  that  (43)  is  an  instance  of  the  bound  claimed  in  the  theorem. 

The  analysis  for  j-values  in  which  0^+^  "  €  (A_,  A+)  is  the  dual  of  what  is 
given  above.  Attention  concentrates  on  qiJY  instead  of  p(i)^5  9^P{^)  replaces 
gap{m)  and  the  roles  of  /3j  and  /3j_i  are  exchanged.  By  Cauchy’s  Interlace 
theorem  either  €  (A_,A+)  or  €  (A-,A+)  for  each  j,  I  <  j  <  n, 

and  so  the  proof  is  complete. 


Remark.  The  analysis  shows  that  the  configuration  of  Ritz  values  has  to  be 
quite  special  to  yield  a  value  for  \p{j)q{j)\  as  large  as  0((A+  -  X-)lgapY^‘^- 
If  in  Case  1,  or  in  Case  2,  is  close  to  A+  or  to  then  the 

minima  in  (40)  and  (41)  are  (9((A+  -  X-)/gap)  and  that  same  bound  holds 
for  \p{j)qij)\-  Since  there  is  usually  a  little  freedom  in  the  choice  of  I  and  m 
we  can  expect  that  for  j  close  to  I  and  to  m 

ip0')?(ni  =  0  (^^^)  ■ 


It  follows  that 


Overlap{p,q)  :=  \p\  •  \g\  =  O 
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9  Accuracy  of  Subspaces 

Consider  the  basis  {p,q]  produced  by  the  use  of  submatrices.  Here  p’s 
support  is  1  :  m,  and  p’s  support  is  I  :  n  and  /,  m  are  chosen  so  that 
\l3mP{m)\  and  |/5,_ip(/)|  are  small.  In  addition  ||pll  =  HpH  =  1.  We  consider 
how  well  span(p,  p)  approximates  the  invariant  subspace  for  A_  and  A^.. 

Let  0^’”*  =  P*Tp  be  p’s  Rayleigh  quotient  and  let  0'’”  =  q^Tq  be  p’s 
Rayleigh  quotient.  We  have 

Tp  =  p^^™  +  em+i/3mp(m),  Tp  =  p0'”*  +  e/_i/3;_ip(/).  (44) 


40 


There  is  no  loss  in  shifting  T  to  the  mean  n  of  the  two  Ritz  values  and 
0‘'^.  Also  let 

0l:m  _  Ql-.n 
6^  - ^ - . 


There  are  two  expressions  for  k  =  p*{T  —  fil)q: 

K  =  p*q6  +  q{m  +  l)l3mPim) 


K  =  —p*q8-\-p{l  —  \)Pi-\q{l)- 


Also 


1  p*9 

q'p  1 


Cjl'.m  _ 

\p.qTT[p,q]=  [  ^  Qi-.n 


Then 


R:={T-pl)\p,q]-\p,q](^^^  ^  • 

By  (44)  and  (45), 

R  =  [em+i/3mP{m)  -  qK,  e,_i/3/_iq(/)  -  Pk]. 
Recall  the  supports  of  p  and  q  and  use  (45)  to  find 


(45) 


p'qK^ 

P'qK-,  -  2p‘q8K  J  ' 

Since  p  and  q  are  not  orthogonal  the  proper  measure  of  [p,  9]’s  residual  is 

-1/21 


R 


Thus  largest  zero  of 


1  p*q 
p'q  1 


det 


R’R  -  a 


21  1  P*9 


P*9  1 
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So 

[1  -  {p^qfW  -  +  /9f.i9(/)'  -  2«2  -  2K\p^qfy  +  const  =  0. 

CTmax  is  majorized  by  the  sum  of  the  roots 

^max  j  _  (p»gj2 

Now  (45)  may  be  rewritten  as  2/c  =  £  +  A1,  defining  C  and  M  in  a  natural 
way.  Since 

2K'^  =  C'^  +  M^-\{C-Mf 

we  obtain 

C^max  <  +  1)^)  +  “  1)^) 

+  ]-[l3mP{m)q{m  +  l)- I3i.iq{l)p{l-1)Y}  •[^  +  {P*Qf]-  (46) 

This  is  an  easily  computed  bound.  The  closer  is  |5(?n  +  1)1  to  H^Hoo  and 
1p(/  -  1)1  to  IIpIIoo  the  lower  is  the  bound  on  CTmax-  From  standard  gap 
theorems  in  [4]  the  sine  of  the  error  angle  is  less  than  <T,nax/5^flP)  where  gap 
is  the  separation  of  [A_,A+]  from  the  rest  of  the  spectrum. 

The  General  Case 

Given  are  Pi,P25  . . .  ,p#  with  the  support  of  Pj  on  (/j  :  mj).  Also  p,  •  Pj  =  0 
if  \i  -j\>l  and  Hp^H  =  1,  j  =  1, . . .  By  construction, 

Tp-  =  p^Oj  +  e/^_iA,_iPj(/j)  +  e,„^+i^m,Pj(mj),  Oj  =  (47) 

In  order  to  simplify  expressions  it  is  convenient  to  shift  T  to  the  mean  value 
p  of  the  Ritz  values  0i, . . . ,  6^.  Let  0i  =  p  +  Si,  i  =  ■  lif- 

There  are  two  expressions  for  «,■  :=  p*(T  —  pI)Pi+i- 

_  f  Pi^i  ■  Pi  Si  +  Pi+l{mi  +  l)^miPi{‘>TT-i), 

\  Pi  'Pi+l^i+i  +Pt(^i+1  ~  l)/?/i+l-l  Pi+l(^»+l)) 

because  of  the  disjoint  supports. 

Define 

n  ;=  {T  -  pI)Pi  -  SiPi  -  Ki-ipi_i  -  mPi+i  (49) 
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where  «o  =  k#  =  0.  Let  R=  [t*!, •  •  •  j**#]-  By  (47), 

Ti  =  +  eTOi+l^m,Pi(^«)  — 

Again  the  disjoint  supports  show  that 

n  •  r,+i  =  Pi_i  •  Pi  Ki-iKi  +  Pi  ■  Pi+i  K-  +  Pij,i  •  Pi+2  )  (50) 

Vi  •  r,+2  =  «,ACj+i  (51) 

and 

Ti-Ti  =  ^/%iP.(/.f  +0.(m.f  +  /C?_1  +  K' 

-2Ki-ipi-i{li  -  l)(3i,.iPi{li)  -  2/c,p,+i(?n,-  +  l)/?m.Pf(mi). 

By  (48), 

K-  -  2/v,p,+i(m,-  +  l)l3m,Pi{mi) 

=  [k;  -  Pi+l{mi  +  l)/3m,Pi{'<^i)?  -  P«+l("^i  +  l)Vm,P«(”^i)^ 

=  (Pi  •  Pi+1  -  P^+li^i  +  l)Vm.P«("^i)^- 

Thus,  for  i  =  1,2,...#, 

n  ■  Ti  =  (1  -  Pi-iih  -  1)^) Pi{U)'^ 

+  (1  ~  Pi+l(^!  "t"  1)  )^m,P'(^^<)  (52) 

+  (P,-l  •  Pi  +  (Pi  ■  Pi+l  <^»■)^ 

with  out  of  range  terms  set  to  zero  when  i  =  1  and  #.  Let 

„  /  P1P2  P2-P3  •  P#-i-P#  \ 

/:=  tridiag  I  1  1  •  •  ^  I 

\  Pi  •  P2  P2  P3  •  P#-i  •  P#  / 

Then  the  measure  of  the  quality  of  Range  [pj, . . .  ,p^]  as  an  invariant  sub¬ 
space  is  given  by  <7^^^  where  is  the  largest  zero  of 

det[R^R-al,/l].  (53) 

some  terms  in  (48),  (50),  and  (52)  are  much  smaller  than  others. 
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Thus  if  u;  :=  cluster  width 

+(1  -  pi+i{mi  +  l)^)/?m.P.K)^  + 

{R*R)i,i+i  =  0  +  C»(u;“/'‘) 

{R*R)i,i+2  =  KiK.+l- 

Using  this  approximation  it  is  straightforward  to  approximate  the  largest 
zero  of  (53)  by  bisection.  The  largest  diagonal  entry  of  R* R  is  a  reasonable 
approximation  to  <T^ax- 

10  Counting  Ritz  Values 

In  order  to  justify  the  selection  of  submatrices  some  background  material  is 
needed.  Recall  that  denotes  a  Ritz  value  from  T'b)™,  the  submatrix 

obtained  by  deleting  row  and  column  j  from 

•  Cauchy’s  Interlace  theorem  (true  for  symmetric  matrices): 

Let  6^^'^  :=  For  each  j  = 

Ai  <  <  A,+,  < 

•  Unreduced  Tridiagonal  Interlace  theorem:  For  each  j  <  n,t  <  j,  and 
1  <  /  <  n  -  j  there  is  a  Ritz  value  in  the  closed  interval 

For  a  proof  see  [3]. 

•  The  Window  Count. 

Let  J  be  any  fixed  closed  interval  on  the  real  line.  Let  #i(i  :k)he  the 
number  of  Ritz  values  of  T^'-^  in  I.  Then  #r(;  :  k)  is  ‘nearly’  monotone 
increasing  with  k.  More  precisely, 

-1  <  #i(i  :  ^  +  1)  -  max  #r(i  :  i)  <  1,  for  all  k. 

This  result  is  a  direct  corollary  of  the  Tridiagonal  Interlace  Theorem. 

Lemma  9  (Window  Count)  Let  I  be  the  convex  hull  of  a  cluster  of  ad¬ 
jacent  eigenvalues  of  unreduced,  symmetric,  tridiagonal  T  =  If  the 

window  count  ^i(l  :  j)  is  well  defined  for  j  =  2, . . . ,  u  —  1  then 

jfjil  :  —  1)  +  #i(i  +  1  :  n)  =  #r(l  ■  n)  —  1,  j  =  2, . . . ,  n  -f  1. 
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Proof.  Since  the  end  points  of  I  are  the  extreme  eigenvalues  of  the  cluster 
there  are  exactly  5^i(l  :  n)  — 1  abutting  subintervals  in  T  of  the  form  [A,-,  A,.|.i]. 
If  the  window  count  is  well  defined  there  are  no  zero  pivots  in  the  triangular 
factorization  (up  or  down)  with  shifts  at  the  end  points  of  X.  Hence  no  Ritz 
values  of  or  fall  at  the  end  points  of  I.  By  Cauchy’s  Interlace 

theorem  (one  can  assign  a  Ritz  value  of  to  each  subinterval)  there  are 

either  #i(l  :  n)  -  1  or  #r(l  :  n)  Ritz  values  of  in  I,  for  each  j.  By 

the  ‘coincidence’  property  of  tridiagonals  there  can  only  be  ^r(l  :  n)  Ritz 
values  in  X  if  one  of  X’s  end  points  is  a  Ritz  value  and  this  is  ruled  out  by 
the  assumption  on  the  window  count. 

11  Submatrix  Selection 

At  present  we  have  no  preferred  method  for  choosing  submatrices  automat¬ 
ically.  Below  we  present  two  methods  that  have  been  satisfactory  so  far. 


Mid-Point  Selection 

T  and  X  are  given.  X  is  the  convex  hull  of  a  cluster.  Let  #  =  #i(l  :  n). 
Define,  for  j  =  0, 1, . . . , 

Ij  =  max{z  ;  #t(1  :  z)  =  j}.  (54) 

Define,  for  j  =  1,...,#, 

Note  that  =  n.  Let  Li  =  -1.  Take  as  initial  submatrices 

(/“  _2  +  2  :  m_,),  i  =  l,...,#. 

Justification.  Since  #i(l  ;  z)  is  nearly  monotone  increasing  in  z  the  in¬ 
dices  {/j}  are  strictly  monotone  increasing  in  j  thanks  to  the  max  in  their 
definition.  Hence 

ruj  <  /j  <  /j  -i-  2,  J  =  0, 1, . . . ,  #.  (55) 


Hence 


^j-2  <  h-2  <  lj-2  +  2. 
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Thus  the  supports  of  the  submatrices  are  disjoint  except,  possibly,  for  nearest 
neighbors. 

Next  we  show  that  there  are  ‘enough’  Ritz  values  in  each  submatrix.  Since 
I  is  not  the  convex  hull  of  the  Ritz  values  of  the  submatrix  (/j_2  +  2  :  n)  in  J 
the  window  count  lemma  is  not  applicable.  By  Cauchy  s  Interlace  theorem 

#2:(/j_2+2  :  n)-l  <  #i(/j_2+2  :  mj)  +  #j(mj+2  :  n)  <  #j(/j_2+2  :  n)  +  l. 

(56) 

However  Lemma  9  may  be  invoked  twice  to  obtain 

+  2  :  n)  =  (#  -  1)  -  #j(l  :  =  #  -  J  -  1,  (57) 

and  _ 

#l{h-2  +  2  :  n)  =  (#  -  1)  -  #i(l  :  h-^)  =  #  “  i  +  1-  (58) 

Use  (57)  and  (58)  in  (56)  to  obtain 

3  >  #x(/j-2  +  2  :  mj)  >  1.  (59) 

In  the  exceptional  case  that  ^j(/j_2  +  2  :  nij)  >  1,  for  some  j,  the  support  of 
the  jth  submatrix  may  be  reduced  from  either  or  both  ends  until  the  count 

is  exactly  1.  .  ^ 

There  is  a  dual  algorithm  using  trailing  submatrices  that  delivers  mj, 

j  =  1,  ...,#  +  1  such  that 

rrij  =  min{i  :  #j(z  :«)  =  #-  j  +  1} 

and  mid-points  Ij,  ;  =  with  Ij  =  [(m,-  +  m,+i)/2j •  This  process 

yields  more  balanced  submatrices 

(/j  I  rTT-j),  J  1, . . . ,  ^ 

with  the  same  bounds  and  disjoint  support  properties. 

We  used  a  selection  very  close  to  {(/;  :  rrij)}  in  1989  before  our  analysis  of 
close  pairs  was  developed.  The  performance  was  very  satisfa(^tory  but  there 
is  no  reason  why  the  mid-points  in  the  ranges  {/j_i  +  1, . . . ,  /j}  should  give 
the  smallest  coefficient  of  \1\  in  the  residual  norm  bounds.  We  want  the 
index  i  in  {/j_i  +  1, . . . ,  Jj}  that  gives  a  minimal,  or  small,  value  to  the  G,  of 
the  Double  Occupancy  Theorem. 
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Selection  by  Pairs 

The  previous  results  for  close  pairs  of  eigenvalues  may  be  used  in  a  system¬ 
atic  way  to  produce  appropriate  submatrices  for  isolated  clusters  containing 
any  number  of  eigenvalues.  The  goal  here  is  to  show  the  existence  of  the 
submatrices,  not  to  produce  an  efficient  algorithm. 

From  Section  7  if  #(1  :  m)  =  2  then  there  exist  suitable  indices  ^  and 
u  (by  no  means  unique)  such  that  =  1  and  =  1.  Usually 

<  u  but  that  is  not  necessary  in  what  follows.  There  are  basis  vectors  with 
supports  on  and  on  whose  residual  norms  are  proportional  to 

the  separation  of  the  two  Ritz  values  that  cause  #(1  :  m)  =  2. 

Now  suppose  that 

#  :=  #(U^)  =  #r(U«)  >  2. 

Let  h  be  maximal  such  that  =  1.  For  the  submatrix  {1  :  h  +  1)  let 

the  optimal  indices  (/x  and  u)  be  72  and  ki.  The  Ritz  vector  (with  Ritz  value 
in  J)  for  submatrix  (1  :  ki)  is  the  first  basis  vector.  Set  ji  =  1.  The  Ritz 
vector  for  (j2  :  /x  -f-  1)  is  not  used  but  the  index  J2  will  play  a  role.  Check 
that 


#(;2  :  n)  =  #(1  :  (60) 

If  not,  adjust  j2  until  (60)  holds.  Having  peeled  off  a  submatrix  ji  :  ki  from 
the  top  and  then  discarded  rows  1  :  j2  —  1  from  T  we  proceed  in  the  same 
way  at  the  bottom  of  T. 

Let  p  be  minimal  such  that  ^(p,  n)  =  1.  For  the  submatrix  (p  —  1  ■  n) 
let  the  optimal  indices  be  and  k#^i.  The  Ritz  vector  (with  Ritz  value  in 
J)  for  submatrix  {jif:,n)  is  the  final  (#)  basis  vector.  Set  A:#  =  n.  The  Ritz 
vector  for  submatrix  (p  —  1  :  krff:-i)  is  not  used  but  k#-i  plays  a  role.  Check 
that 

#(i2,4-i)  =  #-2.  (61) 

If  not,  adjust  A:#_i  until  (61)  holds.  If  #  -  2  >  2  then  repeat  the  proce¬ 
dure  just  described  on  (^2  :  ^'#-1)  fo  obtain  two  new  submatrices  (72,^2), 
(i#-i5  ^#-1)  possibly,  a  remaining  submatrix  with  fewer  Ritz  values  in 
I.  Eventually  one  obtains  #  submatrices  {ji  :  ki),  i  =  1,...,#  each  of 
which,  by  construction,  has  a  simple  Ritz  value  in  X.  The  associated  Ritz 
vectors,  with  zeros  appended  to  make  n-vectors,  constitute  a  good  basis  for 
Z’s  invariant  subspace. 
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Should  it  ever  occur  that  ;,+i  <  fc,_i  we  are  at  liberty  to  increase  ji+i  or 
decrease  jt._i  a  little  subject  only  to  the  constraint  that  #(i.-  ;  fc.)  =  1  for 
each  i.  Our  goal  is  to  have 

^.-1  ^  rnax  entry  in  <  ji+i,  i  =  2,  —  1. 

Selection  by  Envelope 

A  way  to  choose  submatrices  is  suggested  by  Section  6.  First  find  the  envelope 
vector  €  and  then  find  #  entries  of  £  that  are  local  maxima.  Suppose  first 
that  there  is  a  unique  set  of  such  positions  ki,k2, . . . ,  /c#.  For  1  <  j  <  #  the 
jth  submatrix  is 

{kj^i  +  1  :  kj^i  —  1).  (62) 

Usually  the  first  and  the  last  are  (1  :  A:2  ”  1)  and  +  1  :  ti).  However 

in  general  we  must  be  more  careful  and  define  ko  and  k:^^i.  Let  the  first 
nonnegligible  entry  of  £  be  in  position  A;o  +  1  and  let  the  last  nonnegligible 
entry  be  in  position  -  1.  Now  (62)  also  gives  the  submatrices  for  j  =  1 
and  i  =  #. 

In  case  the  location  of  the  ith  summit  is  given  by  several  indices  then  take 
ki  to  be  that  set  and  interpret  fc,-  +  1  as  1  +  max  ki  and  k^  —  I  as  —  1  +  min  ki. 
To  quantify  the  adjective  negligible  we  propose  a  threshold  of  macheps  *  \\S\\ 
=  macheps  •  \/^* 

The  approximation  of  £  is  an  implementation  issue  that  will  not  be  ad¬ 
dressed  here. 


12  More  Examples 

A  Glued  Wilkinson  Matrix 

In  Section  5  we  studied  .  Here  we  use  VFjt  but  take  4  copies  and  connect 
them  by  an  off-diagonal  entry  e  which  is  called  ‘the  glue’  in  the  matrix  Wioo- 
In  an  obvious  extension  of  our  notation  in  Section  2 

VFioo  =  tridiag  ^  VFjt  ^25  ^  • 
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If  e  is  too  small,  like  10"^,  then  Vf^ioo  is  too  close  to  a  direct  sum  of  4  matrices 
and  calculation  of  orthogonal  eigenvectors  is  not  hard.  If  e  is  too  large  (>  2) 
then  the  eigenvalues  are  sufficiently  well  separated  to  be  treated  as  isolated. 
We  use 

e  =  0.3 

and  show  the  largest  8  computed  eigenvalues  in  Table  2.  Without  the  use  of 
Gram-Schmidt  orthogonalization  inverse  iteration  gives  neither  small  resid¬ 
uals  nor  adequate  orthogonality. 


A93 

12.577864 

^^94 

12.577870 

A95 

12.577881 

Age 

12.746191 

A97 

12.746193 

00 

12.939114 

A99 

12.939115 

Aloo 

12.939117 

Table  2;  Selected  Eigenvalues  of  Wioo 


Figures  6,  7  and  8  show  the  submatri.x  indices  used  and  the  basis  vectors 
they  yield.  The  vectors  are  plotted  on  a  logarithmic  scale  with  the  correct 
sign  attached.  All  entries  less  than  10"®  are  treated  as  0.  The  reason  for 
using  log  scale  is  the  to  focus  attention  on  the  smaller  entries. 

In  this  example  the  submatrices  overlap  by  only  one  or  two  indices.  The 
residual  norms  are  the  magnitudes  of  the  first  and  last  entries  and  are  all  less 
than  macheps  •  spread.  The  dot  products  between  vectors  in  each  group  are 
almost  zero  because  the  supports  are  almost  disjoint.  On  the  other  hand  the 
supports  of  the  vectors  0:93  and  *93  are  identical  and  orthogonality  comes 
from  cancellation.  These  dot  products  are  less  than  30  •  macheps. 

An  Example  from  the  Lanczos  Algorithm 

The  Lanczos  Algorithm  with  no  reorthogonalization  was  run  in  double  pre- 


cision  on  a  diagonal  matrix  of  order  205 

D  =  diag{l,  2, . . . ,  200, 400, 400, 400, 400, 600) 

with  starting  vector 

e  =  (l,l,.. 

The  run  stopped  at  step  87  and  the  resulting  tridiagonal  matrix  Tg?  = 
had  5  copies  of  600,  four  copies  of  400,  and  a  single  eigenvalue  at  200,  to 

single  precision. 

Figure  9  shows  the  four  vectors  corresponding  to  the  cluster  at  400.  All 
these  calculations  were  in  single  precision.  Note  that  the  overlap  of  the 
supports  is  greater  than  in  the  glued  Wilkinson  matrix.  However  all  nonzero 
products  were  about  lO"^**,  much  les  than  macheps. 
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Figure  1:  Vectors  z+  and  z_  for  the  pair  near  6  on  a  log  scale 
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Figure  9;  Eigenvectors  *79,  Xgot  ®8i5  ®82  for  Tg?  on  a  log  scale 


